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Abstract: An evolution equation for a population of strings evolving under the genetic 
operators: selection, mutation and crossover is derived. The corresponding equation de- 
scribing the evolution of schematas is found by performing an exact coarse graining of this 
equation. In particular exact expressions for schemata reconstruction are derived which 
allows for a critical appraisal of the "building-block hypothesis" of genetic algorithms. A 
further coarse-graining is made by considering the contribution of all length-/ schematas 
to the evolution of population observables such as fitness growth. As a test function for 
investigating the emergence of structure in the evolution the increase per generation of the 
in-schemata fitness averaged over all schematas of length I, A;, is introduced. In finding 
solutions of the evolution equations we concentrate more on the effects of crossover, in 
particular we consider crossover in the context of Kauffman Nk models with k = 0, 2. For 
k = 0, with a random initial population, in the first step of evolution the contribution 
from schemata reconstruction is equal to that of schemata destruction leading to a scale 
invariant situation where the contribution to fitness of schematas of size I is independent 
of I. This balance is broken in the next step of evolution leading to a situation where 
schematas that are either much larger or much smaller than half the string size dominate 
over those with / « N/2. The balance between block destruction and reconstruction is 
also broken in a k > landscape. It is conjectured that the effective degrees of freedom for 
such landscapes are landscape connective trees that break down into effectively fit smaller 
blocks, and not the blocks themselves. Numerical simulations confirm this "connective 
tree hypothesis" by showing that correlations drop off with connective distance and not 
with intrachromosomal distance. 
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1. Introduction 

One of the most important steps in developing a qualitative or quantitative model of 
a system is to gain an understanding of the nature of the effective degrees of freedom 
of the system. This is equally true if one is considering static, equilibrium properties or 
dynamics; in the context of "simple" systems or of complex systems. An important feature 
that distinguishes the effective degrees of freedom is that their mutual interactions are not 
very strong; that is to say that they must have a certain degree of integrity. In this sense, 
the aim of developing an effective model of a system is to arrive at a description of the 
system in terms of relevant (e.g., "macroscopic") variables. 

Identifying the correct effective degrees of freedom in complex systems is generally 
speaking a very difficult task. To begin with, more often than not the effective degrees of 
freedom are scale dependent, where what one means by "scale" depends on the particular 
problem under consideration. In the case of evolution theory and genetic algorithms, one 
expects to find different effective degrees of freedom at different time scales. Generically 
if a system is complex at the relevant scale then it will admit a simple effective dynamics 
only in terms of complex degrees of freedom: one trades the complicated dynamics that 
results from the non-linear interactions of the many "elementary" degrees of freedom for 
the simpler dynamics of more complicated effective degrees of freedom. What one gains in 
the trade is effective predictability; what one loses is detail. 

It is well worth recalling in this context the example of spin glass models of neural 
networks [1, 2, 3]. In this case the effective degrees of freedom are the overlaps with a 
certain number of "patterns" , each of which is related to a local extremum of the energy 
landscape or Hamiltonian. Since a large number of uncorrelated patterns is involved in 
this effective representation it should be clear that the description of the effective degrees 
of freedom themselves requires a large amount of information: One gets a measure of the 
complexity of the system by the information in its effective degrees of freedom. Note that 
in this example the system's dynamics is guided by large-scale attracting structures (the 
patterns), the effective degrees of freedom (overlaps) being the instruments which measure 
how structure emerges as the system condenses from a disordered phase. 

Not all complex systems have a large-scale structure which can be described in terms 
of macroscopic variables with a "simple" effective dynamics. For example in a critical 
sandpile [4], the relevant macroscopic variable is the avalanche size, and nothing short of 
a detailed description of every grain of sand would allow one to predict the size of the 
next event. Some examples of structured complex systems besides the Hopfield models 
include the brain, gene expression in eukaryotic cells [5], and of course evolution theory 
and genetic algorithms, among many others. We know that these systems are structured 
because their behaviour is manifestly non-random; for instance neural dynamics must be 
structured if the brain is to be of any use! Yet in most cases we have no idea what the 
nature of this structure is, much less how to identify effective degrees of freedom. 

In this paper we will begin to analyse the notion of effective degree of freedom in the 
context of genetic algorithms (GA's) [6,7]. The claim that genetic algorithms are struc- 
tured complex systems is central to their designer's purpose, in that they yield intelligent 
solutions to complex optimization problems rather than a sophisticated sort of random 
search. We emphasize however that GA's form only one area of interest where the results 
and conclusions of this paper are applicable, some others being statistical mechanics [8] 
and biology [5], the Kauffman Nk model [9], and evolution theory [10]. 

Trying to ascertain what effective degrees of freedom a GA is using in order to arrive 
at an optimal solution is in the strict sense a nonsensical question — roughly equivalent 
to asking "what are the effective degrees of freedom of a block of material?" Of course, 
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the answer depends on the type of material under consideration and its state, the effective 
degrees of freedom of a superconductor being quite different to those of a spin-glass for 
instance. However, it is not non-sensical to think of what are the effective degrees of 
freedom in a generic type of fitness landscape. The fitness landscapes we choose to consider 
as being representative of general classes of fitness landscapes are Kauffman's Nk models 
with k = and k = 2. 

As in the example of spin glasses, the dynamics of genetic algorithms can be viewed as 
a condensation process in a rugged landscape. So again one expects the effective degrees of 
freedom to represent the emergence of certain structures, or "patterns" , which are related 
to local fitness optima. In GA theory one usually considers partly-specified patterns, called 
"schematas", and determines the fraction of all the individuals in the population which 
include a particular schemata, this being a measure of order comparable to the "overlap" 
of spin glass models. Since one does not know a priori which schematas lead to a useful 
set of effective degrees of freedom some hypothesis must be made to this effect. 

The standard conjecture about the effective degrees of freedom of genetic algorithms 
is the "building block hypothesis" [6,7], the essence of which is that a GA arrives at an 
optimal solution of a complex problem via the combination of short, fit schematas. In this 
paper we will present both analytic and numerical evidence that generically this is not 
the case. The argument for the block hypothesis is that large schematas are likely to be 
"broken" by the crossover operator. Roughly speaking, the probability that a parent pass a 

length-/ schemata down to its offspring drops off like J^~_}^ , where N is the size of the string. 

However, this argument neglects the possibility that a schemata be repaired, if the other 
parent has the part of the schemata that was broken by crossover; more importantly as it 
turns out, it neglects the possibility that a schemata be reconstructed from two parents 
that have incomplete parts of it. 

It is clear that the validity of the block hypothesis will depend on the nature of the 
fitness landscape. If there is a larger contribution to fitness from string bits that are 
widely separated then clearly large schematas will be favoured irrespective of the effect 
of crossover. On the contrary if the fitness landscape strongly favours smaller schemata 
this would lend support to the block hypothesis. However, the intuition behind the block 
hypothesis is firmly based on the action of crossover not with the pathologies of particular 
landscapes. It is for this reason that we choose to consider the block hypothesis in the 
context of Kauffman Nk models. In particular, the case k = corresponds to the neutral 
case where there are no bit-bit interactions to induce size dependence. In all cases, we will 
assume that the fitness landscape is generic in the sense that there is no systematic bias 
in the fitness function that would favour one part of the string over another. 

The bulk of this paper is devoted to deriving an equation for the evolution of schematas, 
and from there a coarse-grained equation which gives the average contribution of schematas 
of size / to the improvement of fitness. We will show that under general assumptions the 
coarse-grained variable is closely related to the spatial correlation function, so it provides 
information about the size distribution of the effective degrees of freedom. We will apply 
this equation to particular situations, to analyse the effect of crossover on the emergence 
of correlations between distant bits in the strings. Both the theoretical analysis and the 
numerical simulations lead to a new conjecture about the effective degrees of freedom of 
direct-encoded genetic algorithms on an A/c-landscape, which we call the "connective tree 
hypothesis" . 

The format of the paper will be as follows: in section 2, as this paper is not intended for 
a dedicated GA audience, we will give a brief overview of various elements of GA theory. In 
section 3 we will derive an evolution equation for the development of a population of strings 
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under the genetic operators of selection, mutation and simple crossover. We then "coarse 
grain" this equation to derive an effective evolution equation for the evolution of schematas 
of size / and order N2, thus arriving at a generalization of the fundamental theorem of 
schematas [6] . In section 4 we consider a further coarse graining considering the effects of 
schematas of size / but of any order N2 < I. We consider especially the increment in fitness 
per generation from such schematas. In section 5 we consider asymptotic solutions of the 
coarse grained evolution equation near a random initial population for a simple "neutral" 
fitness landscape and also make some comments about what happens near the ordered 
population limit. In section 6 we consider a more non-trivial landscape — a Kauffman Nk 
model [9] with k = 2. Finally in section 7 we summarize our conclusions. 

2. Genetic Algorithms and the Building Block Hypothesis 

GA's have become increasingly popular in the analysis of complex search and optimiza- 
tion problems and in machine learning, one of their chief attributes being their robustness 
(see [11] and references therein for a recent overview). One begins with a complex op- 
timization problem which depends on many variables. The variables and the rules that 
govern them are subsequently coded in the form of a population of strings/ "chromosomes" . 
The latter consist of a set of symbols/ "alleles" , each symbol taking values defined over an 
alphabet. Here we will only consider binary codification though our general conclusions 
apply also to alphabets of higher cardinality. We will denote by A s the space of possible 
states of a string. 

The population is evolved under the action of a set of genetic operators. Reproduction 
can be implemented in many different ways, all have the effect of increasing the relative 
numbers of "fit" strings between one generation and another; fitness being measured by a 
fitness function, / : A s — > H+. The role of most other genetic operators is to encourage 
diversity in the population. In this paper we will restrict our attention to simple crossover 
and mutation. The former is a type of recombination and involves the splitting of two 
parents, Cj, Cj G As, at a particular crossover point k, and the subsequent juxtaposition 
and recombination of the left half of C{ with the right half of Cj and the right half of 
Ci with the left half of Cj, left and right being defined relative to the crossover point k. 
As mentioned, the point of genetic operators such as crossover is to encourage population 
diversity. Optimal strings that do not appear in the initial population cannot subsequently 
appear through the effects of reproduction alone. Crossover is one method for generating 
fit strings that weren't originally in the population of a given generation. Mutation on the 
other hand offers a form of insureance in that if a particular bit is lost it is irrecoverable 
using only reproduction and crossover. Mutation offers a way to recover lost bits that 
may subsequently be important in the construction of an optimum string. Speaking intu- 
itively, we may say that relative to an optimum string, mutation produces "errors" whereas 
crossover merely shuffles them around. We will find this distinction to be an important 
one when we come to critique the building block hypothesis later. Using the language 
of statistical mechanics the evolution of the GA is a competition between the "ordering" 
tendency of reproduction and the "disordering" effects of crossover and mutation. 

The language used in GA theory obviously owes much to evolutionary biology, indeed 
the whole point of GA's was to try to adapt the methods used by adaptive systems in 
nature in the context of artificial systems; selection, mutation and recombination being 
extremely important elements in the search for "fit" organisms. In the discussion above, 
for instance, a string could represent a protein chain of a certain size and the possible 
values of a symbol the number of possible alleles at a particular site on the chain. 

Theoretical analysis of how a GA seeks an optimum solution has focussed on the notion 
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of schematas. If we consider strings of N bits, a schemata is a subset, N2 < N, of bits 
denning a certain "word" constructed from the alphabet. In the N — N2 positions not 
defined by the schemata one does not care about the value of the bit and this is taken into 
account by use of the metasymbol, or "wildcard" , *. For an alphabet of size m there are 
Oi£ = (m + 1) N possible schematas for a particular string. The total number of possible 

schematas in the population is n ? < n(m + 1)^, the exact number depending on the 
population diversity. For a totally organised population = a^. 

The essential idea behind the notion of schemata is that the GA arrives at an optimum 
solution through combining fit schematas. As each string is an example of ~ 2^ schematas 
it is clear that a very large number of them are being processed simultaneously by the 
GA, a phenomenon known as implicit parallelism [6]. Of course, not all these schematas 
survive crossover, which leads us to consider the size of a schemata, /, which is defined as 
(/ = where i and j are the first and last of the N2 defining elements of the schemata 

respectively. In terms of reproduction alone there is no preference for short versus long 
schematas, except as might be induced by the fitness function itself. Equally, mutation 
shows no favour for one or the other. However, if one considers the effects of crossover, 
purely in terms of the crossover point itself there is a higher probability to "break" a long 
schemata than a short one. This of course neglects the possibility of reconstructing a 
schemata even though it does not appear in either of the parents involved in the crossover 
process. This apparent disfavour for large schematas imposed by crossover has led to 
what is known as the "building block" hypothesis which claims that the joint effect of 
reproduction and crossover is to favour highly fit but short schematas which propagate 
from generation to generation exponentially. It is these highly fit, short schematas which 
are then considered to be the effective degrees of freedom in the system, the GA building 
a better solution through combining small sub-solutions. 

3. String Evolution Equation 

In this section we will derive an equation that describes the evolution of a GA induced 
by the effects of the three genetic operators: selection, crossover and mutation. In par- 
ticular we will consider the change in number n(£,t) of strings that contain a particular 
schemata £, of order N2 and size I > N2, as a function of time (generation) in a population 
of size n. It is worth pointing out here that a schemata itself is already a coarse grained 
degree of freedom in the sense that to calculate any properties of a schemata, such as its 
fitness, one needs to take a population average. 

We will first derive evolution equations for the "microscopic" degrees of freedom them- 
selves — the strings. Considering first selection in the absence of mutation or crossover 
one has 



where P'(Ci,t) = {f{C h t)/f{t))P{C h t), f{C h t) is the fitness of string C{ at time t, 
P(Ci,t) = n(Ci,i)/n and fit) = J2i f(Ci,t)P(Ci,t) is the average string fitness. In (1) 
we are neglecting fluctuations in the numbers n(Ci,t), an approximation which should be 
reasonable as long as the population is not too "sparse" , we will return to this point later. 
Clearly, as previously mentioned, the effect of reproduction is to augment the number of 
fit strings, fit here meaning f(Ci,t) > fit), and to decrease the number of unfit strings, 
where by unfit we mean /(Cj, t) < fit). In terms of reproduction alone it is simple to prove 
that average fitness is a Lyapunov function, increasing monotonically as a function of time. 
As also discussed in the last section, the trouble with using selection as the sole genetic 
operator is that the search space for optima is restricted to that of the initial population. 




(1) 
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In complex systems this number will usually be negligible compared to the size of the total 
search space. This implies that finite size effects are important. The theory of branching 
processes may be a more suitable framework in this regard [12] . 

Including in the effects of mutation but not crossover gives rise to the quasi-species 
model [13], with evolution equation 

P(C i ,t+l) = V(c i )P , (C h t) + J2P(c j ^c i )P'(C j ,t) (2) 

where V{c^) = [|^ =1 (1 — p(k)) is the probability that string i remains unmutated, p(k) 
being the probability of mutation of bit k which we assume to be a constant, though the 
equations are essentially unchanged if we also include a dependence on time. V{c 3 ^c^) is 
the probability that string j is mutated into string i, 

^{0^)= n n (!-^)) ( 3 ) 

keiCj-c^ fce{c J --c i } c 

where {Cj-Q} is the set of bits that differ between Cj and Cj and {Cj-Ci} c , the complement 
of this set, is the set of bits that are the same. In the limit where the mutation rate p 

is uniform, Vic^) = (1 - p) N and P(c^c i ) = p d (*>■?') (1 -p) N ~ d (*>•?'), where d H (i,j) is 
the Hamming distance between the strings Q and Cj. The behaviour of the solutions of 
equation (2) has been much discussed in the literature (see for example [8] and references 
therein), although mainly in the context of a flat fitness landscape. One of the principal 
features of interest is the existence of an "error threshold" separating an "ordered" (selec- 
tion dominated) phase from a "disordered" (mutation dominated) phase which manifests 
itself as a second order phase transition at a certain critical mutation rate. 

We will now consider the effects of crossover without mutation. This is a much less 
studied case theoretically (though see [14]), but one that is very important from the point 
of view of effective degrees of freedom, since unlike mutations it is sensitive to the linear 
disposition of bits along the string. It is also plays a very important role in biology. With 
crossover the evolution equation can be written in the form 

JV-l 

P(C h t + 1) = P'{C h t) - ^CciCjityP'iCi, t)P'(Cj, t) 

N-l 

+ EE^C)^-,')*') 

CjjtCi CrfCi k=l 

where p c is the probability to implement crossover in the first place, 

C£h(k) = 6(d%(i,j))6(d%(i,j)) (5) 

and 

C§Uk) = \ (*(d£ (ij))8(d%(ij)) + 6(d%(ij))6(d*(i,l))) (6) 

where d^(i,j) is the Hamming distance between the right halves of the strings Q and Cj, 
"right" being defined relative to the crossover point k. The other quantities are defined 

6 



(4) 



analogously. C y c .' c .(k) is the probability that given that was one of the parents it is 

(2) 

destroyed by the crossover process. Cc. Ci {k) is the probability that given that neither 

parent was Cj it is created by the crossover process, so this represents a gain term. It 
is naturally much easier to destroy an individual string by crossover than create it hence 

(2) (2) 
Cq Cl (k) is a very sparse matrix. Cc jCl (k) represents a contact interaction term in Hamming 

space. Another important property of C^c (k) and C^ C[ (k) is that they are completely 
population independent, depending only on string configurations and not string numbers. 

In the case of mutations and selection without crossover the non-equilibrium evo- 
lution equation has been mapped into an equilibrium statistical mechanics problem us- 
ing transfer matrix techniques [15], where the role of inverse temperature is played by 
P = ^log(p/(l — p)). One can also find an analogy for the crossover operator which 
can provide a more intuitive understanding of its effects. Imagine a "population" of n 
one-dimensional Ising chains in a strong magnetic field h, where the effects of spin-spin 
couplings may be neglected. We denote spin up by 1 and spin down by 0. The "fitness" 
of string % is simply /(Cj) = h{n\ — uq). Clearly selection will favour strings with large 
values of n\ relative to no- So what are the effects of crossover? Consider two selected 
chains of size N = 7: 1111000 and 0100111. One may think of the first chain as being 
composed of a domain of up spins of size 4 and a domain of down spins of size 3 separated 
by a domain wall or "kink". Similarly, the second chain consists of two domains of up 
spins of sizes 1 and 3, and two of down spins of sizes 1 and 2. These domains are separated 
by two kinks and two anti-kinks. If the crossover occurs at k = 5 the resultant chains are 
1111111 and 0100000. Chain number one is now "homogeneous", containing no kinks or 
anti-kinks, whilst chain two contains a kink-anti-kink pair. Thus the action of crossover 
has been the annihilation of a kink-anti-kink pair. A crossover at k = 3 would have 
yielded: 1100111 and 0111000, each chain now having one kink-anti-kink pair. One can 
think of this as just a kink-kink scattering process that has conserved the total number of 
kinks and anti-kinks. Finally, a crossover at k = 6 would give 1111011 and 0100100 where 
now there are three kink-anti-kink pairs, one in the first chain and two in the second. In 
this case there has been kink-anti-kink creation. Thus we see here that crossover may be 
interpreted as creation, annihilation and scattering of kinks — domain walls, thought of 
as topological defects. Here, the fitness landscape is such as to favour spin chains without 
topological defects. This is because we are considering a "ferromagnetic" fitness landscape. 
If the fitness landscape were such as to favour 0's in odd positions and l's in even positions 
then an optimum chain would be 0101010, i.e. an inhomogeneous state with a maximum 
number of kink-anti-kink pairs. Generically then, crossover may be thought of as inducing 
interactions between "domains". Exactly what type of domain is favoured is of course a 
function of what is the particular fitness function of interest. It is also worth noting that 
in the population crossover without selection preserves the total number of 0's and l's in 
any given bit position in the population. Thus for instance, if we consider a non-optimum 
string as having "errors" relative to an optimum string, pure crossover without selection 
cannot change the total number of errors in the population it can only shuffle them around. 

Equation (4) is an extension of the "schema theorem", or fundamental theorem of 
GA's, [6,7] which states that for a schemata, £, of size / 

^Hl)>P'(^)(l-f C (^)), (7) 

to the case where the schemata of interest is the entire string (an analogous equation was 
derived in [16]). The evolution equation we have derived takes into account exactly, given 
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the approximation of a large population, the effects of destruction and reconstruction of 
strings. 

Combining the effects of both crossover and mutation, where we assume that mutation 
is carried out after crossover, we have the evolution equation 

P{C h t+l) = V( Ci )P c (C h t)+J2 V(c^ Cl )P c (C v t) 



where 

N-1 

p c (c„t) = p'( Ci ,t) - ^ J2 Ee^W'fe*)^*) 



c^c ik =l 

N-1 W 



The various evolution equations exhibit different dynamical fixed points. In the "low 
temperature limit" p — > 0, in a non-trivial fitness landscape, one has stable fixed points at 
the local fitness extrema. If f{C^t) > f(Cj,t) VCj : V{c^c^) > 0, then 

n(Ci) = n, n{Cj) = (9) 

is a stable fixed point of (8) . These evolution equations are exact only in the approximation 
of an infinite population. For a finite population there are no stable fixed points (for p ^ 0) 
due to the effect of fluctuations in the reproduction process. This can be seen in the simple 
example of two types of string, A and B, with a population of size 10 and fitnesses f a 
and fg (/a > fo). If we start with equal proportions of A and B the probability that 

riA — > 6 relative to the probability that tia — > 4 is (/a//£?) 2 - So the effect of fitness 
is to increase the probability that we have more A's in the population. Consider now if 
the initial population is = 1, rig = 9. The probability that A will disappear at the 
first time step is Sa = (1 + (iA/9/g)) _1 ^. In a flat fitness landscape Sa = 0.35. For 
fA = 2fBi Sa = 0.13 and for /a = 10/g, Sa = 0.01. We see then that unless the fitness 
advantage of A over B is quite pronounced there is a non-trivial probability that the fitter 
fixed point is not reached, whereas reaching the latter is an inevitable conclusion of the 
evolution equation neglecting fluctuations. In general, the Neutral Theory predicts that 
the selective coefficient must be greater than 1/n to ensure that selection dominates over 
random drift. 

We now turn our attention to the derivation of an evolution equation for schematas, 
£, of order N2 and size /. Before doing this it is convenient to return to equation (4) to see 
that the notions of schemata and coarse graining appear very naturally when considering 
crossover of strings. Considering the destruction term: the matrix (5) restricts the sum 
to those Cj that differ from C{ in at least one bit both to the left and to the right of the 
crossover point. One can convert the sum over Cj into an unrestricted sum by subtracting 

off those Cj that have dj?(i,j) = and/or d^(i,j) = 0. Similarly one may write the 
reconstruction term as 
N-1 



(10) 

.(2) 



k=l \ Cj c. 



-2Y J Ch%{k)P'(C l: t)P\c ]: t)-P\c l: t)P\c l ,t) 



C 3 
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The second and third terms cancel with corresponding expressions from the destruction 
term hence (10) can now be written as 

E p\c^t)p'(c h t) (ii) 

where Cf is the part of C{ to the left of the crossover point and corespondingly for Cf 1 . 
However, by definition 

f(Cf,t) = -?- £ P'( Cj ,t) (12) 

1 C P C l 

where n c L is the total number of strings in the population that contain C^. As f(C^,t) 

is the average fitness of the substring Cf, one can think of this substring as a schemata, 
likewise for C^. In terms of these "schematas" the final form of the string equation is 

N-l 

P( Ci , t + l) = P'(Ci, t) - jp-^ E (P'( Ci , t) - P'(Gf, t)P'(cf, t)) (13) 

k=l 

with 

P'(cf,t)= £ P'{ Cj ,t) (14) 

and similarly for P (Cf,t). 

One sees that crossover very naturally introduces the notion of coarse graining even 
though we are working in terms of the microscopic degrees of freedom - - the strings. 
The reconstruction probability depends on the relative fitness of strings that contain the 
constituent elements of Q, but given that there can be many strings that contain one 
must take an average over these strings. In this sense we are integrating out the "degrees 
of freedom" represented by the bits that are not contained in Cf or Cf-. Equation (13) 
shows that the effects of reconstruction will outweigh destruction if the parts of a string 
are more selected than the whole. 

Before deriving a schemata evolution equation including crossover and mutation let us 
consider the effects of reproduction alone. The proportion of elements of the population, 
P(£,t), that contain £ satisfies the evolution equation 

P(£,t + l)=P%t) (15) 

where P'(£, t) = *)> 

«(£,*) 

£ f(Ci,t)n(C h t) 

~m,t)= — , (i6) 

the sum is over all strings Cj that contain £, and f(t) = £^ /(£, t)P(£, t)/ i s 
the average fitness per string or per schemata of the population. Note that the sum over 



strings that contain £ is a sum over the possible values of the bits that are not definite 
elements of £, i.e. the wildcards. In this sense, as above in (13), "degrees of freedom" 
have been integrated out of the problem and (15) represents an exact coarse-graining of 
the original string evolution equation. 

Note that the sum over different binary words on the N2 defining bits of the schemata 
is a partition of the identity, i.e. 

£P(*,f) = l (17) 

words 

If we sum also over different possible schemata configurations we have 

N 

E 1 = E "cn, E 1 = 3N - 1 < 18 > 

£ 7V2 = 1 words 

which is just the total number of schematas (except for the order schemata with no 
defining bits). 

Considering mutation without crossover we "coarse grain" the microscopic equation (2) 
by summing over all C{ D £. One can write an effective evolution equation for schematas 
evolving under mutation 

p(e, t+i) = tW(£, t) + j2 m^)p'{u, t) (19) 



where the effective coefficients V{£) and V are 

m = U( 1 -p^ ( 2 °) 

k=l 

and 

In the latter the sum is over strings Cj that contain the schemata jffj, where ^ differs in at 
least one bit from £ on the ./V2 defining bits of the schemata. 

As with strings the effect of recombination in the form of crossover is two fold: it 
potentially destroys schematas that were present in one parent but not the other; on the 
other hand it offers the possibility of reconstructing schematas even though they were not 
present in either parent. To derive an evolution equation for schematas, including in the 
effects of crossover, we return to equation (13) and sum over all strings C{ D £. One finds 

N-l 

P(£, t + 1) = (1 - p c )P'(£, t) + jf^ E P '( C *> f ) ( 22 ) 

We now break the sum over crossover points into those that cut the schemata itself and 
those that cut outside the schemata. In the reconstruction term if the cut is outside 



10 



the schemata, to the right say, then the sum over Cf 1 is one. Similarly if the cut is to 

the left with the sum over Cf. The remaining sums yield P'(£,t) and this term cancels 
with an analogous expression originating in the destruction term. For the reconstruction 
contribution from cuts in the schemata we denote by rj^ (tir) the bits to the left (right) of 
the crossover point that are not in the schemata and note that 

J2p'(cf,t)p\cf,t) = j2J2 p '( c i^ p '( c f^- ( 23 ) 

nL vr 

We will denote by and the parts of the schemata to the left and right of the 
crossover point respectively. Now, P'(cf,t) = P'(£ L ,t), where by definition P'(£ L ,£) 

= (7(£l>*)/7(*))-P(£l>*)> fi^Lit) being the average fitness of the schemata Analo- 
gous expressions hold for £^>. With these results the final form of the schemata evolution 
equation including crossover is 

l-l 

p& t+i) = p'(e, t) - jp-^ J2 (p'& t) - p'(e„ t)p f (u t)) (24) 

k=l 

where the sum is only over crossover points that cut the schemata. 

The interpretation of this equation is very similar to that of (13). In the reconstruction 
term P'(^ L ,t)P'(^ R ,t) is the probability that one parent is selected that contains the left 
part of the schemata and the other contains the right part. A schemata will be augmented 
by the effects of crossover if, as in the string case, its constituent parts are selected more 
than the whole schemata. Compared with (13) a further coarse graining has been carried 
out by summing over all the states of bits outside of £. Combining now the effects of 
selection, mutation and crossover the schemata evolution equation is 

P(£, t+l) = V(t)P c (t, t) + Y^V{U^)PM h t) (25) 



where 

l-l 

Pcfo t) = P'(£, t) - jf^ {P'& t) - P\^l, t)P'(U t)) (26) 

k=l 

This evolution equation is the fundamental equation governing the evolution of schematas 
and is written at a "semi-microscopic" level in that it is written in terms of individual 
schematas. It represents an exact coarse-graining of the corresponding string evolution 
equation after summing over all possible states of the non-schemata degrees of freedom. 

Another useful concept we will introduce here is that of "effective fitness", f e s(^,t), 
which we define via the relation 

P(U + l) = ^j^P(U) (27) 
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comparing with equation (25) one finds 




(28) 





Thus we see that the effect of mutation and crossover is to "renormalize" the "bare" 
fitness /(£, t). The destructive effects of crossover and mutation give a multiplicative type 
renormalization whilst the reconstruction terms give an additive type renormalization. In 
the low "temperature" limit where mutation and crossover go to zero f e g(£,t) — > /(£,£). 

Another concept we will find useful is that of an effective selection coefficient s cS = 
f c a(£,t)/ f(t) — 1. If we think of s cS as being approximately constant in the vicinity of 
time t , then s eff (£o) gives us the exponential rate of increase or decrease of growth of the 
schemata £ at time to- 

4. Effective Degrees of Freedom and Coarse Graining 

As mentioned in the introduction one of the most important steps in obtaining a 
qualitative and quantitative understanding of a system is deciding what are the relevant 
degrees of freedom of the system. This is often a quite difficult thing to do owing to 
the fact that they are "scale" dependent. In the case of evolution dynamics this means 
that the effective dynamics depends on the time scale considered. Trying to understand 
such time dependent behaviour quantitatively is very difficult as almost inevitably one will 
have to resort to an approximation technique, which invariably depends on focusing on the 
relevant effective degrees of freedom as in the methods of effective field theory. However, 
if they are time dependent then what starts as a good approximation focusing on a certain 
type at one time will usually break down as one approaches time scales where they are 
qualitatively quite different. 

One feature that is very common, if one has found a reasonable set of effective degrees 
of freedom, is that their mutual interactions are not very strong, so that they have a 
certain degree of integrity. Calling something an effective degree of freedom is not a very 
useful thing to do if it is not readily identifiable as such. For instance, in low energy QCD 
gluons and quarks are not very useful concepts as they are so strongly coupled via highly 
non-linear interactions that they form baryons and mesons, bound states of the former. 
The latter have a much higher degree of integrity than the former at such energies. 

So how is the above related to the present discussion of GA's? GA's, as algorithmic 
representations of complex systems, have many degrees of freedom. For instance, in the 
case where the state of a string of size N is defined as a binary word, for a population n 
the total number of possible states is ~ (2^)" in the case where strings are identifiable by 

a label other than the state of their bits, and ~ n2 N in the case where permutations of 
identical strings are not counted separately. Both are exponentially large numbers. 

Ideally, the search for optima proceeds in a smaller space, spanned by effective "coarse- 
grained" degrees of freedom. The traditional answer to the question: "What is the nature 
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of these degrees of freedom?" is, as mentioned previously, the "building block hypothesis" : 
that small segments of string have an activity that is relatively decoupled from the rest of 
a string — these "blocks" are assumed to be sufficiently compact that they have a high 
probability of being preserved by crossover. The GA supposedly uses these building blocks 
in order to arrive at a global solution. 

We can get some idea of the dynamical behaviour of schemata due to crossover by 
restricting attention for the moment to a flat fitness landscape. In this case 

l-l 

P(£, t+l) = P(£, t) - ( P & *) " p (fr« *)) (29) 

k=l 

For an uncorrelated population crossover is completely neutral and we have a scale invariant 
situation. 

To solve the evolution equation (29) in the case of a correlated population one needs to 
solve the corresponding equations for £ L and these will involve reconstruction terms that 
contain £ LL , £ Lfl , £ RL and ^ RR . The first two are the components of £ L and the latter two of 
£r. Naturally this process can be iterated relating fine grained degrees of freedom to more 
and more coarse grained degrees of freedom, where more and more bits (N — N2) have been 
summed over. Obviously when one arrives at one schematas, the maximally coarse grained 
degrees of freedom, the process stops as one cannot split by crossover such schematas. We 
see then that crossover leads to an hierarchy of equations relating fine grained degrees of 
freedom to successively more and more coarse grained degrees of freedom. 

Restricting attention to two schematas in the flat fitness landscape setting and consid- 
ering the continuous time limit one arrives at the following differential equation 

= -Pcj^ (P(i3, t) ~ P(i, t)P(j, t)) (30) 

where % and j are the definite bits that define the two schemata and also the two one 
schematas respectively. As one cannot split a one schemata P(i, t) and P(j, t) are conserved 
quantities thus one finds 

t) = P(ij, 0)e~ Pc ^ t + P(i, 0)P(j, 0) (l - e" Pc ^=i*) (31) 

Thus one sees that P(ij,t) approaches an uncorrelated fixed point P*(ij) = P(i, 0)P(j, 0) 
exponentially rapidly. The sole effect of the size of the schemata is to govern the rate 
of approach to the fixed point, an exponentially small preference being given to smaller 
schematas. 

The steady state solution for a schemata £ of order N2 is 

P*(Z) = U P M),0) (32) 

i=\ 

where P(£(i),0) is the probability of finding the one schemata corresponding to the i'th 
bit of £ at t = 0. One can verify that this steady state solution also is a result purely of 
the effects of reconstruction. Without reconstruction there is no other fixed point other 
than zero! We see then that reconstruction is the driving force of crossover and will always 
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come to dominate. This is very much contrary to the standard block hypothesis point of 
view which treats schemata destruction as the dominant effect. We can also make another 
interesting observation associated with the effective fitness f e g(£,t) and crossover. Here 
the effect of crossover is to renormalize the fitness. The effective selection coefficient is 

Scff = ~ Pc [—1 ) + Pc [—1 ) P(ij,t) (33) 

Thus schemata destruction gives a multiplicative renormalization that contributes nega- 
tively to the effective fitness advantage. However, schemata reconstruction leads to an 
additive renormalization of the effective fitness which exceeds the contribution of the de- 
struction term if i and j are negatively correlated. 

In general the fitness landscape itself induces correlations between £^ and £r. In 
this case there is a competition between the (anti-) correlating effect of the landscape 
and the mixing effect of crossover. Selection itself more often than not induces an anti- 
correlation between fit schemata parts, rather than a positive correlation. Indeed, in the 

neutral case of a k = landscape one has I + ^^-5f^ < (I + ^^5f^ L )(l + ^&5f{: R ), 
so selection induces an anticorrelation when 5f^ L ,5f^ R > 0: In an uncorrelated initial 

population, P'(£,t) < P f (^i,t)P'(^ji 7 t). This means that crossover plays an important 
role in allowing both parts of a successful schemata to appear in the same individual. 

We can analyze this effect in more detail taking once again the case of 2-schematas. 
Defining the correlation C(ij, t) = (P(ij, t)/P(i, t)P(j, t)) — 1 then in terms of the selection 
coefficient, = /(£,£)// — 1, one finds 

f (f + Sij) nt; . ^ (s t Sj + S t + Sj) 



Note that the effect of crossover is to diminish correlations induced by the fitness landscape, 
however crossover cannot change the sign of the correlations. The larger the value of / in 
this simple case the more the correlations are damped. 

This is the effect which we saw previously in the context of a flat landscape. In the 
extreme case I = N, p c = 1 the effect of crossover is to eliminate all correlation between i 
and j. In the neutral (k = 0) case, 

(l + Si + Sj) SiSj 



c(ij, t + 1) = i - Pc^r^- • J \ c(ij, t) - . (35) 

V N - 1 J V(! + + Sj) (1 + Si)(l + Sj)J 

Thus the effect of crossover is to weaken but not cancel completely the anti-correlations 
induced by k = selection. In the remainder of this section we will consider this effect for 
general schematas. 

In our search for the relevant effective degrees of freedom and in analysing the building 
block hypothesis we will consider schematas of length / irrespective of their order or their 
overall position in a string. It should be clear that this is a further coarse graining relative 
to the evolution equations considered earlier. Unfortunately the evolution equation (25) 
by itself is not very useful for analysing schematas of size /, the reason being that any given 
string contains schematas of all sizes. However, consideration of just about any quantity 
in conjunction with (25) and a sum over schematas of a given length is meaningful. For 
instance, one could consider how f(£,t) changes in time and subsequently how its average, 
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< >h over an possible schematas of size / changes. Our notation here is that for 

any function A (£,£), 

(N-l+l) I 

E EEE p (^)^*) 

< A(f) >,= * =1 ^>""* (36) 

W ' (AT- Z + 1)2*~ 2 

where / > 2. The first sum is over the possible beginning point, i, of the schemata and 
the following two sums represent the different configurations of any number N2 < / — 2 
specified bits chosen among the / — 2 available sites. The number of available sites is / — 2 
because we fix the ends bits. 

Using (25) one may derive a recursion relation for the expectation value of the observ- 
able A, from 

(N-l+1) I 

E EEE p (^ +1 )^ +1 ) 

<A(t+l)> l= 1=1 »^»f™** — 

V ' 1 (N-l + l)2 l ~ 2 V ' 

We now encounter a difficulty: time dependence enters in the above equation not only in the 
changing probability distribution P(£,£+l), which can be substituted using (25), but also 
in t + 1). This occurs even though many observables of interest are time-independent 
functions of the string states as the summing over degrees of freedom associated with 
passing to a more coarse grained description induces an implicit time dependence in the 
coarse grained observables. For example /(£,£) is a population-dependent observable even 
though f(Cj) is not. 

One can derive a function on schematas, such as schemata fitness, via a population 
average with the string probability P(Ci, t + 1), for which one has the microscopic evolution 
equation, but this clearly leads to a very complicated calculation. To simplify matters, to 
search for structure in the population we define a time-independent function on schematas. 
The particular function we choose is the average selective advantage that in-schemata bits 
would enjoy if the schemata were immersed in a random population, 

2n-n 2 

Sf (=(^)(w^) £ (/«.-») -5). (*o 

^ ' ^ ' r} — words 

where r\ represents the out-of-schemata bits and the average fitness in a random population 
has been normalized to 1/2. Note that here, and in the rest of the paper, we are looking 
at the fitness deviation per schemata bit as opposed to section 3 where the total fitness of 
a schemata was being considered. This observable corresponds to the effective fitness of 
in-schemata bits either if the population is in fact random, or if the landscape assigns an 
independent fitness contribution to each bit in the chromosome (k = in the terminology 
of the Kauffman iV/c-model). In general, it is a useful test function with which one can 
probe for the emergence of structure during the first steps of evolution away from a random 
initial population. We will refer to this observable below as the in-schemata fitness. 

We will make use below of the following simplified averages: if A(£) is independent 
of the initial defining point of the schemata, or if the landscape is "generic" , then we can 
sum over this point to find 

1 1 

< A(t) >/= tj=2 £ EE P ^ (39) 

iV 2 = 2{iV 2 } words 
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By "generic" we mean that within the class of landscapes we are considering, such as an 
iVfc-model for a particular value of k, there is no systematic bias in the fitness function 
for a particular part of the string, i.e. the sums over words, configurations and N2 leads 
to an average which is effectively translation invariant, i.e. the system is effectively self- 
averaging. Similarly, if A(£) depends only on the order of the schemata, N2, one has 

1 1 

< A(t) > l= £ l ~ 2 CN 2 -2P& t)A(0 (40) 

N 2 =2 

We will also use the notation « A »i to represent the average over schematas and over 
crossover points, namely 

1 l ~ l 

«A» l =—J2 <A> l ( 41 ) 

i=l 

Considering the expectation value of the in-schemata fitness, the equation which gives 
the improvement of < Sf^ >i from generation t to generation t + 1 is 

A, =« ^^ffp »l (42) 

where 5f cS (^t) = f cS (^t) — f(t). More explicitly, using the evolution equation for 
schematas, one finds 

^ =< >i -Pc (^\) « (PU*) " P'i^P'iUt)) » h (43) 

where 5f(£,t) = /(£,£) — /(£). The first term is independent of / in a random population 
if the fitness landscape itself is / independent. All the I dependence lies in the crossover 
terms. If the parts of a schemata are selected more than the whole it is clear that the net 
contribution from crossover will be positive. 

It worth pausing here to consider the meaning of the quantity A;. As we defined it, A; 
measures the average improvement of the in-schemata fitness over one step of evolution. 
How does this improvement come about? First of all, schematas with t) > will be 
more frequent in the parent population, thanks to the selection factor (1 + s), where 

fit) Jv J£ ■ ' 

where the latter equality is only true for a random initial population or k = model. The 
next step is to consider the action of the crossover operator. On the one hand selected 
parents with £ may not pass it on to their offspring if crossover "breaks" the schemata. 
However, there is a possibility that £ be reconstructed from parents that have parts of £ 
but not all of it. This reconstruction term gives a positive contribution because if £ has 
a selective advantage then subsets of £ will, for an average type of landscape, be more 
likely than not to have some selective advantage as well, so the parts of £ that are needed 
for reconstruction are available in the parent population with an enhanced probability. 
The key question is, of the destruction and the reconstruction terms which is larger for a 
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particular value of /? Before we turn to answering this question in particular cases, let us 
consider the relation between A; and the spatial correlation function. 

If the population is uncorrelated, in other words if P(£,t) = Y\i P(£i,t), where £j is 
the i'th bit of £, then the expectation value of Sf^ is independent of /, as iV^/e = J2i ^/^ 
and 

<E 5 4>; ( 45 ) 

i 

is just the uncorrelated sum of contributions from 1-schematas. The fact that the existence 
of correlations in P(£, t+1) implies an / dependence can be demonstrated explicitly. One 
writes 

1 N 2 N 2 

MM ^w 2 ^ m) + iv 2 (iv 2 -i) ^^ ( /2( ^ } " 2 (/lte) + ) ' (46) 



where 



/lte) = p^T E MMdtk}) (47) 

/2(^,) = ^2 E MMdtk}) (48) 

and we are considering only up to two-point correlations. Defining Ss^ = Sf eB (£, t) — Sfg, 
which is a measure of the selective advantage over and above the in-schemata fitness, one 
finds 

N 2 N 2 _ / 1 x 

fit) - 5 ft- 



For a /c = landscape, 



^ - jv^n) E E (/2(^o - \um + a (£;))) 



(49) 



(50) 



So, in this case we see that the existence of a selective advantage is due to the existence of 
correlations in the effective fitness. Defining a selective coefficient S[ which represents the 
selective advantage for a schemata to be of size / one finds 

A;=«5/|»;(l + Si ) (51) 

where 

Sl = -L* « . (52) 

In this expression for A;, << 8fg » is independent of I for a random initial population. 
Thus we see that any / dependence can be attributed to the existence of spatial correlations. 
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If the reconstruction term from crossover exceeds the destruction term for some /, then 
from the above one concludes that the fitness improvement attributed to a particular bit 
in the string depends on it being part of selected schematas of this size. The maximum 
value of A; is attained when the contribution of an individual bit is most enhanced by 
the fact that this bit belongs to strings that include other specified bits at a distance at 
most equal to /, namely those strings which include a selected schemata of size /. That 
the conditioning information on the existence of other specified bits should be useful is a 
direct consequence of the correlations between the different bits in the string. The reason 
why we emphasize the relation between A; and the correlation function is that correlations 
are intimately linked to the emergence of effective degrees of freedom. In this sense, the 
function A^ is related to the expected size distribution of the effective degrees of freedom. 



5. Asymptotic Solutions 

In this section we consider some asymptotic solutions of the evolution equation for 
A; derived in section 4. In particular we will consider two limiting cases: the evolution 
of schematas starting from a completely random initial state; and a random perturbation 
around a completely ordered state. As one of our principal considerations is in investigating 
the validity of the building block hypothesis we will set the mutation rate to zero as the 
effects of the latter do not depend on schemata size. We will derive expressions for A/(t+l) 
and Ai(t + 2) starting out from an initial random population at time t. 

For a random initial population at time t, 

Aft + 2) =« Sffi + 2) »i -At(t + 1) (53) 

Even though Sf^ is time independent we use the above notation to indicate that its expec- 
tation value is with respect to the probability distribution at time t + 2. 

In the initial random population, the effective schemata fitness is the in-schemata 
fitness 6fg and 

6f(t,t) = ^Sft, (54) 

Thus one finds 

+ 1) = (l " Pc(jfzj)) «a»i + 

+ > + (56) 

where we have introduced the notation for the quadratic terms 

- = i £ 2 ^fl m 

words 

& = Pb £ tW&' (58) 

words 
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with an analogous expression for 

As one of our principle purposes here is to examine the block hypothesis in light of 
the evolution equation we have derived, and the associated notions of coarse graining 
and effective degrees of freedom, we will try to derive explicit results in some concrete 
cases based on generic fitness landscapes. The Kauffman iV/c-models provide such a set of 
landscapes. Here we will specialize to the case k = which is neutral in the sense that it 
neither favours nor disfavours correlations between bits. We will discuss how our results 
generalize to a k = 2 landscape in the next section. In the k = landscape, 

*/«=^/&+t£v&- (59) 

We also have that << df^df^df^ »i= which results in the complete cancellation of 
the destruction and reconstruction crossover terms the final result being 

A l (t+ 1) =« a »i . (60) 

The above expression is for an arbitrary k = landscape. In order to find a more 
explicit solution we must consider a more explicit landscape. We will consider two: a 
binary landscape where the fitness of a bit may only take two values, 1 and 0; and a 
landscape where the fitness of a bit is selected uniformly at random from the interval [0, 1]. 
Both landscapes conform with the requirement that the average fitness per bit in a random 
population is 1/2. Let xi denote the deviation from the mean fitness of bit number i, i.e. 
Xi = fi- 1/2. We find 

N 2 

N 2 6fz = $>£ n , - 1) x ni , 
i=l 

where the indices nj denote the specified bits of the schemata (i = 1, • • • , N2). Squaring 
this expression and using Yli^j < — l)(2£rij — 1) >= one finds 

<aw € 2 >=<^£4>- 

i=l 



The averaging over configurations then gives, for / > 3, 

= 2 ^ 



N 



i=l 



where 

m % = 1(1 - - 2) (l<i< N-l + 1), 

m% = (I 2 -31) +z(/ 2 -5/ + 8) + l -^p2~ (*<0> 

and symmetrically for i > N — I. 

For the case of a binary landscape the final answer is 



1 

<< a »i= — . (61) 
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In the random landscape for large N we can assume that the average over the N bits 
(weighted by rij) can be replaced by an average over the distribution of X{ used to generate 
the landscape. Then, 

«a»i=— . 62 

Thus one sees that crossover acts in a scale invariant way at the first time step of evolution 
from a random initial population: there is no preference whatsoever for small blocks at 
the expense of large blocks. 

We will now consider what happens at time t + 2. The extra ingredient we need relative 
to the above calculation is << 8f^(t+2) »i- To calculate this we in turn need to calculate 

P (£,t + 1), P + 1) and P t + 1) i.e. the selection probabilities at time t + 1, 

calculation of which requires knowledge of /(£, t + 1), t+1) and fit + 1). 

Specializing once again to a k = landscape, one finds 

P«, t + 1) = ^ ( 1 + ^Sf ( + ^ £ ^HJhn ) • («0 



and 

P'&t+l) 



2^(1 + 2a s ) 



f(t+l) = (l + 2a 8 )f(t) (64) 



{l + —df{) +2 — U { N-N 2 ) 



+ iV^I^ 1 + "aT^ ^ —^2— 6f ^L S kR 

k=l 

*- m tl^ \J2( N R 6 k R P(k-»L) + N L^ L P(N- k -N R) + N 2 Sf^(J2 fa + £ A 



iV(iV — 1) 
where 



fc=l fc<£ fc>£ 

(65) 

1 ^ 2(iV-iV 2 ) 2 

r\— words 

1 /2(fc-A^)\ 2 ^ 2 

?7l— words 
L— words 

In (68) the sum is over words associated with bits to the left of the crossover point given 
that all the schemata £ lies to the right of the crossover point. The expression for (3( N -k) 
is analogous but with the sum over words being associated with bits to the right of the 
crossover point given that the schemata lies to the left. Equation (67) is associated with 
a sum over words for the bits to the left of the crossover point but excluding bits that are 
in the schemata. Likewise the expression (3(N-k-N R ) contains a sum over words associated 
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with bits that are out of schemata to the right of the crossover point. Finally, a s = 

1 + (2N 2 /N)a + (2(N - N 2 ) /N)a [N _ N2) . 

If one considers and as schematas on exactly the same footing as £ then the 
expressions for P'(£l, t + 1) and P'(£r, t + 1) are completely analogous to those above 
except that one is now considering the bits of ^ and £r which lie to the left and the right 
of the crossover point. Combining these expressions with equations (65-68) after some 
lengthy but straightforward calculations one finds 

A,(t + 2) = (iriS K< " >>l + (l + 2a!)V-D <K 1^ + M "- k ' ] >>l 

+ (i + 2« s )(iv-i) < a 2^* + a fa-*) >i 

(69) 

The first term in the right hand side of (69) is the result of selection at t + 1 on the 
population that was the result of selection at t. It is crossover independent as is manifest 
in the fact that p c does not appear. The last two terms are associated with the effects of 
selection on the population at time t+1 which has incorporated non-trivial contributions 
from crossover at time t. More precisely, the picture is the following: k = selection on a 
random population induces anti-correlations in P'(Clj when both Sf^ L and Sf^ R are 

positive due to the quadratic term ~ Sf^ R Sf^ L . Crossover reduces these anticorrelations, 
thereby enhancing the whole schemata £ = £l + £i? relative to its parts. Selection at t + 1 
reinforces this effect of crossover to enhance £, leading to the net positive contribution to 
A l (t = 2). 

As above, we will consider the binary landscape and a landscape where the fitness 
of a bit is selected uniformly at random from the interval [0,1]. Similar calculations to 

AT 

the ones given above for << a »i lead to the following expressions for << j^iPiiPk + 
0LP(N-k)) »l and < (a T,k<£ fa + a Efc>£ P{N-k) >h for 1 > 4: 

« + 0L&N-*)) »l= 7^3 ( N ~l)(N-l-2) (70) 

, v—v a ^ _ c(Z-l) c(/ 2 -5/ + 8) c 

where c = 3 for the binary landscape and c = 1 for the random landscape. Putting these 
terms together we arrive at the final expression for A(t + 2), 

A , n , {3N-c\ c f 2N 2 - Nl + l 2 + N + 1-8 - (8/2 1 , . 
A(I + 2| =(3f^j« + 4 144(3* + < W-1) 1 (72 » 

^From this expression one can readily see that the effects of crossover are always positive, 
i.e. the effects of schemata reconstruction outweigh those of schemata destruction. A 
graph of A(t + 2) versus I can be seen in figure 1. 

We now turn our attention briefly to the limiting case of an almost organised popula- 
tion. In this limit, one can consider that the strings differ from the population-consensus 
at most at one site; we will refer to the differing site as a "defect" . There are N possible 
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defects each with an effective negative fitness differential over the consensus string. The 
evolution equation implies an equation for the growth or decay of defects, where one im- 
mediately sees that the effect of crossover is strictly neutral: there is no net creation or 
destruction of defects by pure crossover without selection. In other words the geometrical 
effect of crossover is zero. The role of crossover in this limit is only to mix the defects in 
the population. So in this limit A; is once again strictly independent of /. The possibility 
of multiple defects in a single string raises the possibility of correlations in the distribution 
of defects along the string, which would induce mirroring correlations in the schematas, so 
A/ may acquire a non-trivial /-dependence as a second order effect in the mean density of 
defects, which is the perturbative expansion parameter near the ordered limit. 

Taking once again the k = landscape, the fitness penalty per bit for two defects is 
given by 28 fa = 5fo + 6fj, so 



Since SfaSfj > 0, P (ij,t) < P'(i,t)P'(j,t): selection induces an anticorrelation between 
the defects. Now, crossover enhances P(ij, t+1) to bring it closer to P(i, t + l)P(j, t + 1). 
Since the schemata (ij) is more strongly damped than i or j separately, the selection at 
the next time step will destroy more defects than without crossover. So here again as 
near the random limit crossover has a beneficial effect due to the enhancement of whole 
schematas relative to its parts. Near the random limit this was beneficial because the whole 
schemata was picked up by selection, here it is beneficial because the whole schemata is 
more strongly damped by negative selection so defects die out more rapidly. 

One can think of the initial random population as being the high temperature fixed 
point of the model, given that every point in configuration space is equally occupied; in this 
limit the correlation length is zero. The ordered limit would then naturally be interpreted 
as the low temperature limit. Our results from this section can be summarised by saying 
that crossover is a net positive contributor to fitness growth at second order near both 
temperature limits, T — > and T — > oo. 

6. Effective degrees of freedom in the N2 landscape 

The k = landscape discussed in the last section has the virtue of being "neutral" from 
the point of view of the block hypothesis, however it is not a realistic example of landscapes 
usually encountered in complex optimization problems: It is strongly correlated, has a 
single optimum and does not present frustration we will therefore turn our attention now 
to Kauffman's Nk model with k = 2. There are two mechanisms by which connected 
landscapes can induce correlations: On the one hand, schematas that contain landscape- 
related bits have a sharper selective coefficient because there are fewer unspecified bits 
involved in their fitness contribution. On the other hand, the balance between the schemata 
destruction and reconstruction terms from crossover is broken to first order. For example 
if the effective fitness of a whole schemata is less than the sum of the effective fitnesses 
of its parts, the growth of a schemata can be magnified by breaking it down into parts, 
growing the parts and then reconstructing the schemata. We will analyse both of these 
correlating effects below. 

The Nk model can be described as follows: the fitness function is specified by giving, 
for each bit of the string, k connections to as many other bits and a table of 2 k+1 random 
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numbers uniformly distributed in the unit interval. To compute the fitness contribution of 
one bit of a string one forms a (k + l)-word from this bit and its connected partners and 

translates it into an integer n G {1, 2, ■ • • , 2 fc+1 }. The fitness is then given by the entry 
number n in the table of random numbers. Note that the k connections and the table of 
random numbers are chosen independently for each bit in the string. For k = the fitness 
contribution from each bit is independent of the others and takes one of two possible values. 
The resulting landscape gives rise to a unique extremum (barring accidental degeneracies). 
At the other extreme, k = N, the fitness of every string is an independent random number 
(random landscape). The ruggedness of the landscape increases with k, which allows one 
to use k as a free parameter in order to be able to model real landscapes of arbitrary 
ruggedness. Here we will consider only the representative case k = 2. 

Let us first restrict our attention to schematas of two definite bits (N2 = 2). There are 
three possible situations for a 2-schemata. Either the two bits are not connected by the 
fitness landscape, one bit is the connected partner of the other, or the bits are connected 
both ways. This last situation is improbable for N >> 1, so we will focus on the first 
two cases. If one has two unrelated bits in an otherwise random initial population, the 
effective fitness of each bit in this schemata is equal to an average of four of the eight 
random numbers in the fitness table at that site, because one of the three bits is fixed and 
the other two are picked at random. If, on the other hand, one of the bits is connected to 
the other then its fitness contribution is given by averaging over the two possible values 
that the other connected partner can take. This is an average of two out of the eight 
random numbers in the fitness table. The key point is that the average of two random 
numbers typically differs from 1/2 more than the average of four. Therefore, schematas 
which include landscape-related bits will have a stronger selective coefficient, in absolute 
value. This leads to a bias for the condensation of schematas that recognize the structure 
of the fitness landscape. 

In order to make this argument more precise, we need to compute the expectation 
value of the best of mi averages of random numbers, where each random number is 
uniformly distributed in the unit interval. The probability distribution of the best of m\ 
averages of random numbers is equal to the derivative of the probability that z is 
larger than all m\ averages. If we call P(x\, • • • , x mi ) the distribution of the averages, the 
probability that z is greater than all of the averages is 

p(z > sup(rrii)) = / dxi--- dx mi P{x\, ■ ■ ■ , x mi ) T~\0(z-Xi). (73) 
Jo Jo ^1 

Since the mi averages are statistically independent in this case, this expression reduces to 



= (I 1 



mi 

dxP{x)9(z -x)\ . (74) 



The expectation value of the best of the mi averages is 

< 2 max >= J zp'(z)dz (75) 

For our purposes it is sufficient to consider the cases where mi, 777-2 £ {2,4}. For 
?7i2 = 2 the distribution of the average of two uniformly distributed random numbers is 
given by 

P(x)=4x for x<l/2 
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and the symmetry condition P(l/2 + x) = P(l/2 — x). The expectation values for the 
best of mi such averages are: for mi = 2, ni2 = 2, < z max >= 0.6167; while for mi = 4, 
TU2 = 2, < z max >= 0.7300. For = 4 (averages of four uniformly distributed variables), 
one has 

128 

P{x) = —x 3 for x < 1/4, (76) 

P{x) = i^x 3 - - for 1/4 < x < 1/2, (77) 

and the symmetry condition given above. One finds the following result: for mi = 2, 
m2 =4, < z max >= 0.5673. Finally, the expectation value of the best of mi uniformly 
distributed random variables is 

< ^ >= —■ 78 

mi + 1 

Here we will need only the best of eight, which is equal to 8/9 = 0.8889. 

As mentioned previously, we will consider only schematas with two definite bits. If the 
two bits are not related by a landscape connection the effective fitness of any one of these 
bits in a random population is given by the average of four random numbers from the 
fitness table, where the averaging is over the values of the two connected partners which 
determine the fitness contribution of this bit. Thus, the best schemata can be expected to 
have a selective advantage 

*l = 4_i = 1(0.567- 0.5). (79) 

Now, if there is a landscape connection between the two bits of the schemata, the con- 
tribution of one of these bits to the string fitness is given by an average of two random 
numbers, since we only need to average over the other connected partner which is not in 
the schemata. The best schemata in this case will have a selective advantage 

4 /0.567 + 0.73 \ , . 

" 2= n{ 2 05 ' (80) 



In the case N = 40 analyzed in the previous section, the ratio of the growth rates of a 
2-schemata which recognizes a landscape connection to that of one that doesn't is 

r = = 1.0081. (81) 

1 + si 

This result should be compared to the effect of crossover which we computed in the k = 
landscape at the second time step: l + sj was found to fluctuate between 1.0025 at / = N/2 
and 1.0029 at / = 2, / = N — l. Clearly, the conclusion is that landscape correlations should 
be taken into account in a proper analysis of the condensation of "schematas" . 

In our discussion we neglected the possible existence of frustration and assumed that 
the fitness contribution of the two bits of the schemata could be optimized independently 
without affecting the mean fitness contribution of the other bits in the string. A more care- 
ful analysis including frustration would be much more complicated; however one expects 
that at least for small frustration should be marginal and that our conclusions should 
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hold qualitatively. Of course, there are fewer 2-schematas that recognize a landscape con- 
nection than not, so the overall contribution of such schematas to the condensation of 
effective degrees of freedom is diluted by a phase space factor 2/N, relative to 2-schematas 
of landscape-related bits. Thus, one expects that the first stage of divergence from a 
random population will be dominated by schematas which do not understand the fitness 
landscape. The landscape-related schematas, which grow at a faster rate, will eventually 
overcome the contrary phase space factor and become more important in the condensation 
process. 

Returning to the fundamental equation (24) for the growth of in-schemata fitness we 
can evaluate the effect of crossover in a k = 2 landscape by calculating A; in the first step 
away from a random population: 

A« , =< >W -^|^ < « - 1 ><»'< Sf ( ( V{ - - ^ V& ) ><»), 

where we have used the identity, valid in a random population, ^2 words Sf^ L Sf^ R Sf^ = 0, 

and the average <>( n ) runs over the set of all schematas with jV~2 = 2 definite bits with 
n = 0, 1, 2 landscape connections between schemata bits. We are also assuming that there 
is no explicit / dependence in the fitness landscape itself. 

The evaluation of dft depends on the number of in-schemata connections. One must 
evaluate the contribution of each of the two bits in the schemata. If there are no in- 
schemata connections then the averaging over unspecified bits leads to a contribution to 
Sf^ equal to the average of four of the eight random numbers in the fitness table. If one of 
the bits is connected to the other, then in evaluating its fitness contribution one has only 
one unspecified bit and the contribution to Sf^ turns out to be the average of two of the 
eight random numbers. The values of 8fg L and Sf^ R are always given by the average of 
four random numbers. 

Thus, if there are no in-schemata connections then Sf^ = j^Sf^ L + jj-df^p, and the 

contribution of the crossover term vanishes as in the k = case. If we denote by cr 2 the 
variance of the random number distribution used to generate the tables of eight possible 
fitness contributions for each bit, the averaging over schematas with n = landscape 
connections gives 

A (°) =< ± 6 fc 2 >M= — 

^N 2 =2 < N °Jt > 2N - 

On the other hand, if there is one in-schemata connection then < 5f^ is the variance 
of the average of two random numbers plus the variance of an average of four, while one 
of the terms < Sf^5f^ L >(°) or < 5f^5f^ R >(°) is equal to half the variance of two random 
numbers, the other being the variance of an average of four. Using < / — 1 >= (N + l)/3, 
one finds 

(1) _/3 p c (N + l) \a 2 
JV2 = 2 V 4 12(AT-1) J N 

-A(°) +a(°) ( l Pd N + 1 ) \ 

-^"2=2 + ^2=2^2 6(A T_1) J' 

Similarly, for n = 2 in-schemata connections, 

A (2) _ A (0) __ A (0) ( Pc (N + l) \ 
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In these expressions, the p c -independent correction is the result of the selective advantage 
of schematas that recognise landscape connections, which we discussed previously. These 
numbers appear somewhat magnified relative to r. This is only because here we are 
examining the in schemata fitness per bit whereas r was associted with the growth rate of 
the entire string. The crossover contribution reduces this correlating effect of the landscape 
but only by a factor of 2/3 in the limit p c — > 1, N — > oo. In conclusion, schematas which 
reflect the landscape connections contribute more (per bit) to the growth of fitness than 
schematas involving unrelated bits. 

A similar conclusion can be expected to hold if one considers larger schematas with 
N2 > 2. Extending the argument to general schematas one is led to consider fitness trees: 
the fitness tree of a bit is the set which consists of the bit itself, its connected partners, the 
connected partners of the these connected partners, and so on. We can define an order n 
truncated fitness tree by truncating this procedure after n steps. The dominant value of 
n depends on the degree of order in the system, which is a function of the mutation rate. 
For a high mutation rate one expects the gene pool to be highly disordered and effective 
degrees of freedom are mostly single bits (n = 0) or truncated fitness trees with small 
values of n. As the mutation rate decreases larger trees can condense and the dominant 
value of n increases. This leads us to propose the following conjecture on the nature of the 
effective degrees of freedom, which we shall call the "fitness tree hypothesis" . 

• The effective degrees of freedom of genetic algorithms with Nk fitness landscapes 
are the truncated order n fitness trees. The effective value of n increases as the 
condensation process allows for an increasingly structured gene pool. 

In order to test this hypothesis we designed a numerical simulation with a population 
of 1000 individuals in an Nk landscape, with N = 40 and k = 2. The crossover probability 
was taken to be equal to one. The spatial correlation function measures the correlation of 
bits at distance d along the string and tests the block hypothesis directly. A second corre- 
lation function measures this correlation as a function of the connective distance between 
bits, defined as the smallest number of landscape connections from one bit to the other. 
The results are shown in [Figures 2 a-c]. At generation 15 ([Fig. 2a]) the spatial correlation 
function reflects the preference for small schematas, as suggested by the block hypothe- 
sis. After 100 generations ([Fig. 2b]) the spatial correlation function has become weak and 
roughly independent of the distance; on the other hand the correlation of landscape-related 
bits becomes significant at connective distance one. By generation number 150 one finds 
statistically significant correlations up to connective distance four, which are progressively 
reinforced. In [Figure 2c] we show the correlation functions at generation 200. Since the 
mutation rate is equal to zero in these simulations, population diversity eventually de- 
creases and becomes insufficient to derive statistically relevant correlation coefficients. At 
generation 350 the strings are totally condensed up to connective distance two (the first 
two correlation coefficients are equal to one); the gene pool is completely organized at the 
500'th generation. 

Throughout this article, with the exception of the numerical experiments finite size 
effects were neglected. If one considers their contribution the failure of the block hypothesis 
only becomes more apparent. Here we will mention only briefly two arguments to this 
effect. 

In a finite population the difficulty of finding a good schemata must be considered, 
since not all schematas are present in the initial population. Since the number of schematas 
with fixed N2 grows with I as ^ 2 Cjv 2 _2? one ex P ec ts it to be easier to discover good large 
schematas than small ones. Another important finite size effect is the effective non-linearity 
of selection emphasised in the Neutral Theory of Molecular Evolution [17]: Schematas 
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with only weak selective coefficients are not necessarily selected, as the neutral drift due 
to fluctuations in the selection of parents dominates over selection unless |s e //| > 1/-P, P 
being the effective breeding population. This leads to an effective non-linearity of selection 
due to the existence of a threshold in favour of schematas with a selective coefficient above 
this value. Since the selective coefficient of a schemata grows in proportion to N2, this 
effect favours schematas with large N2. Combining this result with the previous comment 
on the probability to find good schematas being proportional to ^ -2 Cj\r 2 -2) we find that 
schematas with small values of / are strongly disfavoured by the finite size effects. 
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7. Conclusions 

The bulk of this paper has been devoted to deriving equations that describe the evo- 
lution of string populations in genetic algorithms, and in particular how effective degrees 
of freedom may emerge during this evolution. We started with an equation that governed 
the evolution of the strings themselves under the joint action of selection, mutation and 
crossover. We found that this equation could be elegantly expressed in terms of the evo- 
lution of a string C{ and its subcomponents relative to the crossover point, and C- . 
This naturally introduced the notion of a coarse graining relative to a description in terms 
of the strings themselves, the coarse graining being associated with sums overs strings that 
contained a part of C{. Subsequently we derived an analogous equation for the evolu- 
tion of schematas, this time in terms of a schemata and its constituent parts. Schemata 
evolution is coarse grained relative to string evolution because of the summing over the 
N — N2 non-schemata bits. The evolution of a schemata of 0{N<2) is described in terms 
of its constituent parts which are schematas are of order less than Thus the action of 
crossover invokes a natural hierarchy of coarse grainings. Such a hierarchy is reminiscent 
of a renormalization group transformation where there is a coarse graining over a subset 
of degrees of freedom, such as in the one-dimensional Ising model where one may sum over 
every other spin in the partition function for instance. In the genetic algorithm case this 
coarse graining stops naturally when one arrives at the evolution of 1-schematas as these 
are not decomposable into even more coarse grained degrees of freedom. 

In one sense it is remarkable that one may solve analytically a genetic algorithm albeit 
for a simple fitness landscape and over a short time interval, however, what is lacking is a 
reasonable approximation scheme with which one may attack the evolution equations. Just 
as solving an exact renormalization group equation is almost impossible so with genetic 
algorithms finding exact solutions is probably hopeless. However, implementing renor- 
malization group transformations approximately has had remarkable success in explaining 
many physical phenomena. We hope that finding analogous techniques in the study of 
genetic algorithms might lead to similar success. 

Starting from the evolution equation for schematas, a further coarse-graining was per- 
formed to arrive at an expression for the average contribution of all schematas of size / 
to the improvement of fitness. Applying this equation to the particular case of a k = 
landscape, where each bit contributes independently to fitness, we showed that the net 
effect of crossover on fitness growth is slightly positive for all /: the effect of schemata 
reconstruction always exceeds that of destruction! Schematas that are either much smaller 
or much larger than half the string size are most enhanced. 

A different situation arises if one considers a k > landscape. In this case the sum 
of the effective selective advantages of the parts of a schemata is not necessarily equal to 
the effective selective advantage of the entire schemata. Only when the parts of a selected 
schemata are less selected than the whole (the deceptive case), crossover leads to a net 
destructive force as schematas are broken down into pieces which are then lost due to their 
low selectivity. The schematas that are selected over a long time scale are those that break 
down into useful parts, independently of their size. 

Finite size effects break the apparent symmetry of the geometrical effect of crossover 
about / = N/2: The existence of a selection threshold favours highly fit schematas with a 
large number of specified bits N2, and these can be found with a reasonable probability only 
if their length / is large. Combining this argument with the /-dependence of in-schemata 
fitness growth A/ one concludes that the effective degrees of freedom will be schematas 
with large N 2 and / > N/2. 

This conclusion has important and surprising consequences for the designer of Genetic 
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Algorithms. It is often thought that GA designers should strive to find a coding such that 
bits that "cooperate" are placed near each other on the chromosome, so as to resist the 
destructive effect of crossover. This is generally speaking a very difficult task, since the 
structure of the optimisation problem usually does not match the linear topology of the 
strings. Our results show that this task is pointless: if anything one should try to place 
cooperating bits as far from each other as possible. Of course this is the most probable 
outcome if no attention is placed to the linear disposition of the bits, so this is not a 
problem one should worry about. 

We should stress that the above comment by no means implies that the choice of 
coding is irrelevant. The choice of a genetic interpreter is crucial to generate a high 
density of states near desired fitness extrema and perhaps also to guide the emergence 
of an algorithmic language [18] which facilitates the search for new highly fit schematas. 
These issues however lie beyond the scope of the present paper. 

With the results of this paper in mind it is interesting to recall the analogy between 
GA's and spin glass dynamics discussed in the introduction. In both cases one is describing 
a condensation process in a rugged landscape, guided by the emergence of overlaps with 
certain structures or "patterns" . One of the chief reasons why in GA's the overlaps with 
schematas is considered rather than with entire strings (A^ = N schematas) is that genetic 
populations are generally too disordered for such a rigid structure as a completely-specified 
string to be of much relevance. Of course the same can be said of spin glasses far from 
equilibrium. This suggests that the notion of "schemata" may find some usage to study the 
condensation of spin glasses from an initial disordered phase. One can carry the analogy 
between GA's and spin glasses one step further and suggest that, in the case of sparesely- 
connected neural networks, the truncated connective trees may form a priviledged class of 
schematas for the purpose of developing an effective theory of neural dynamics. 
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FIGURE CAPTIONS 



Figure 1. The multiplicative renormalization of the effective fitness due to crossover, 
(1 + si), is represented as a function of the schemata length /. The crossover term 
gives a positive contribution to fitness growth for all values of /, which is greater for 
schemata sizes that are either much smaller or much larger than half the chromosome 
size. 



Figure 2. The average absolute correlations between bits in the chromosome are given 
in terms of (B) the linear distance which separates the bits on the chromosome, and 
(C) the connective distance defined as the smallest number of landscape connections to 
go from one bit to the other. Very early on one notes a slight preference for correlations 
between bits that are near each other on the chromosome, i.e. with / << N (2a). By 
t = 100 the correlations between landscape-related bits become important (2b), and 
they come to dominate at t = 200 (2c) . At this point the population is highly organised 
and correlations on the basis of linear chromosome distance are no longer significant. 
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